12/03/2019

Motivation

Standard Approach




In meteorology or climate, it is common to pool data or consider problems on a region by region basis.


This can make statistical problems more tractable.

Climate example

National Resource Management Regions

Post-processing example

Question






How should assign regions for the analysis of extremes?

Application

Assign regions that are likely to experience similar impacts

Regionalisation

Use these regions to inform our statistical analysis

Outline

1. Regionalisation

  • Clustering
  • Dependence of bivariate extremes
  • Practicalities
  • Classification

2. Visualise spatial dependence

  • Max-stable processes

3. Spatial post-processing

Clustering

Clustering Distance

Form clusters based on extremal dependence!
(Bernard et al 2013)

  • Use only the raw annual maxima
  • No information about climate or topography

Clustering Distance

Use the F-madogram distance
(Cooley et al 2006) \[d(x_i, x_j) = \tfrac{1}{2} \mathbb{E} \left[ \left| F_i(M_{x_i}) - F_j(M_{x_j})) \right| \right]\] where \(M_{x_i}\) is the annual maximum rainfall at location \(x_i \in \mathbb{R}^2\) and \(F_i\) is the distribution function of \(M_{x_i}\).

This distance can be estimated non-parametrically.

Extremal Coefficient

For \(M_{x_i}\) and \(M_{x_j}\) with common GEV marginals, \(\theta(x_i - x_j)\) is \[\mathbb{P}\left( M_{x_i} \leq z, M_{x_j} \leq z \right) = \left[\mathbb{P}(M_{x_i}\leq z)\mathbb{P}(M_{x_i}\leq z)) \right]^{\tfrac{1}{2}\theta(x_i - x_j)}. %= \exp\left(\dfrac{-\theta(h)}{z}\right),\]

The range of \(\theta(x_i - x_j)\) is \([1 , 2]\).

Can write our distance measure as a function of the extremal coefficient, \(\theta(x_i - x_j)\), \[d(x_i, x_j) = \dfrac{\theta(x_i - x_j) - 1}{2(\theta(x_i - x_j) + 1)}.\]

Therefore the range of \(d(x_i, x_j)\) is \([0 , 1/6]\).

K-Medoids Clustering

Partitioning around Medoids (PAM): (Kaufman and Rousseeuw 1990)

  1. Randomly select an initial set of \(K\) stations. These are the set of the initial medoids.
  2. Assign each station, \(x_i\), to its closest medoid, \(m_k\), based on the F-madogram distance.
  3. For each cluster, \(C_k\), update the medoid according to \[m_k = \mathop{\mathrm{argmin}}\limits_{x_i \in C_k} \sum_{x_j \in C_k} d(x_i, x_j).\]
  4. Repeat steps 2. – 4. until the medoids are no longer updated.

Result

SWWA EA

Assumptions

Truth

Example

Consider the \(\| x_i - x_j \|\) as the clustering distance.

Density example

Back to our example

Gridded data

Hierarchical Clustering

  1. Each station starts in its own cluster
  2. For each pair of clusters, \(C_k\) and \(C_k'\), define the distance between the clusters as \[d(C_k, C_{k'}) = \frac{1}{|C_k| |C_{k'}|} \sum_{x_k \in C_k} \sum_{x_{k'} \in C_{k'}} d(x_k, x_{k'}).\]
  3. Merge the the clusters with the smallest distance
  4. Update the distances relative to the new cluster
  5. Repaet steps 3 - 5, until all points are combined in a single cluster

Hierarchical Clustering

\[d(C_k, C_{k'}) = \frac{1}{|C_k| |C_{k'}|} \sum_{x_k \in C_k} \sum_{x_{k'} \in C_{k'}} d(x_k, x_{k'}).\]

PICTURE

Classify

  • Classify a station relative to its closest neighbours
  • Use a weighted classification \(w\)-kNN

IMAGE

Results

SHINY APP

Choosing a cut height

IMAGE

Similar Dependence

Where can we assume a common dependence structure for extremes?

Max-stable processes

Max-stable process

Shiny App

Level curves

Visualising Dependence

SWWA

TAS

Relevance to post-processing

Oesting et. al 2017

  • approach

  • cut the region into two

Conclusions

Conclusions